A Comparison of Algorithms for Banded Matrix Multiplication*
نویسنده
چکیده
We present and compare several methods for multiplying banded square matrices. Various storage schemes and their implementations are discussed. Of particular interest is an algorithm for multiplying matrices by diagonals, which always references contiguous matrix elements. Two blocked implementations also are presented. These specialized routines are attractive for multiplying matrices whose bandwidths are known to be small relative to the size of the matrices. Results from tests performed on a Cray-2, Cray Y-MP, and RS/6000 are given. It is shown that, for specialized applications, a substantial savings can be realized over the standard three-loop multiplication algorithm. 1. Motivation During the development of the Invariant Subspace Decomposition Algorithm (ISDA) 3], which computes the eigenvalues and eigenvectors of matrices, we began an investigation into the issues relating to banded matrices. In particular, since the ISDA relies heavily on matrix multiplication to decompose the matrix into smaller subproblems, we are exploring ways to decrease the amount of time spent performing matrix multiplication. One way to accomplish this is to perform periodic reductions to banded form 5]. This allows us to work mostly with banded matrices and to perform only those portions of the matrix multiplications necessary to compute the elements within the resulting band. (By band or bandwidth, we mean the upper or lower bandwidth of a matrix, as deened in 2]. In this paper we deal only with square matrices that have the same upper and lower bandwidth, although all of the algorithms can be extended to accommodate diiering upper and lower bandwidths, as well as non-square matrices.) On many architectures, dense matrix multiplication can be implemented to sustain computational speeds close to the achievable peak for the machine. Therefore, a banded variant of ISDA will be worthwhile only if banded matrix multiplication can be implemented eeciently. We have, therefore, been investigating a variety of methods for performing this computational primitive. If the bandwidth of a square matrix can be kept small relative to the order of the matrix, then a substantial savings in arithmetic operations can be realized by performing a banded matrix multiplication instead of a dense matrix multiplication. If the bandwidths of the two matrices to be multiplied together are known in advance, then the bandwidth of the resulting matrix will be merely the sum of the bandwidths of the two original matrices. Since the bandwidths of the matrices used in the ISDA are always known, we should be …
منابع مشابه
A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure
The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...
متن کاملScalable Blas 2 and 3 Matrix Multiplication for Sparse Banded Matrices on Distributed Memory Mimd Machines
In this paper, we present two algorithms for sparse banded matrix-vector and sparse banded matrix-matrix product operations on distributed memory multiprocessor systems that support a mesh and ring interconnection topology. We aslo study the scalability of these two algorithms. We employ systolic type techniques to eliminate synchronization delay and minimize the communication overhead among pr...
متن کاملAn Accurate an Efficient Selfverifying Solver for Systems with Banded Coefficient Matrix
In this paper we discuss a selfverifying solver for systems of linear equations Ax = b with banded matrices A and the future adaptation of the algorithms to cluster computers. We present an implementation of an algorithm to compute efficiently componentwise good enclosures for the solution of a sparse linear system on typical cluster computers. Our implementation works with point as well as int...
متن کاملFast Structured Matrix Computations: Tensor Rank and Cohn-Umans Method
We discuss a generalization of the Cohn–Umans method, a potent technique developed for studying the bilinear complexity of matrix multiplication by embedding matrices into an appropriate group algebra. We investigate how the Cohn–Umans method may be used for bilinear operations other than matrix multiplication, with algebras other than group algebras, and we relate it to Strassen’s tensor rank ...
متن کاملA Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems
We investigate and compare stable parallel algorithms for solving diagonally dominant and general narrow-banded linear systems of equations. Narrow-banded means that the bandwidth is very small compared with the matrix order and is typically between 1 and 100. The solvers compared are the banded system solvers of ScaLAPACK [11] and those investigated by Arbenz and Hegland [3, 6]. For the diagon...
متن کامل